神经辐射场(NERF)已成功用于场景表示。最近的工作还使用基于NERF的环境表示形式开发了机器人导航和操纵系统。由于对象定位是许多机器人应用的基础,因此进一步释放了机器人系统中NERF的潜力,我们研究了NERF场景中的对象定位。我们提出了一个基于变压器的框架NERF-LOC,以在NERF场景中提取3D边界对象框。 Nerf-Loc将预先训练的NERF模型和相机视图作为输入,并产生标记为3D边界对象的框作为输出。具体来说,我们设计了一对平行的变压器编码器分支,即粗流和细流,以编码目标对象的上下文和详细信息。然后将编码的功能与注意层融合在一起,以减轻准确对象定位的歧义。我们已经将我们的方法与基于传统变压器的方法进行了比较,我们的方法可以实现更好的性能。此外,我们还提出了第一个基于NERF样品的对象定位基准Nerflocbench。
translated by 谷歌翻译
Federated学习(FL)最近作为一种增强隐私的工具而受到了极大的关注,可以由多个参与者共同培训机器学习模型。FL的先前工作主要研究了如何在模型培训期间保护标签隐私。但是,FL中的模型评估也可能导致私人标签信息的潜在泄漏。在这项工作中,我们提出了一种评估算法,该算法可以准确计算使用FL中的标签差异隐私(DP)时,可以准确计算广泛使用的AUC(曲线下)度量。通过广泛的实验,我们显示我们的算法可以计算与地面真相相比的准确AUC。
translated by 谷歌翻译
差异化私有(DP)数据发布是一种有前途的技术,可以在不损害数据主体的隐私而传播数据。但是,大多数先前的工作都集中在单一方拥有所有数据的方案上。在本文中,我们专注于多方设置,其中不同的利益相关者拥有属于同一数据主体的属性集合。在线性回归的上下文中,允许各方在完全数据上训练模型,而无需推断个人的私人属性或身份,我们首先直接应用高斯机制并表明其具有小的特征值问题。我们进一步提出了我们的新方法,并证明其渐近地收敛到随着数据集大小增加的最佳(非私有)解决方案。我们通过对人工和现实世界数据集的实验来证实理论结果。
translated by 谷歌翻译
连接和自动驾驶汽车(CAVS)正在越来越广泛地部署,但是目前尚不清楚如何最好地部署智能基础架构以最大程度地发挥其功能。一个关键的挑战是确保骑士能够可靠地感知其他代理,尤其是被阻塞的药物。另一个挑战是,智能基础架构的渴望是自主的,并且很容易扩展到与现代交通信号灯相似的广阔部署。目前的工作提出了自我监督的交通顾问(SSTA),这是一种基础架构边缘设备概念,该概念利用与通信和共同培训框架共同利用自我监督的视频预测,以启用整个智能城市的自动预测流量。 SSTA是一款静态安装的摄像头,可俯瞰复杂的交通流量的交点或区域,可预测流量流作为未来的视频帧,并学会与相邻的SSTA进行通信,以在视野(FOV)中出现在视野中之前预测流量。拟议的框架旨在达到三个目标:(1)设备间的通信以实现高质量的预测,(2)对任意数量的设备的可伸缩性,以及(3)终身在线学习以确保对不断变化的环境的适应性。最后,SSTA可以直接广播其未来预测的视频框架,以供骑士进行自己的后期处理以进行控制。
translated by 谷歌翻译
在本文中,我们解决了预测拥挤空间中的Egentric相机佩戴者(自我)的轨迹的问题。从现实世界中走向周围的不同相机佩戴者数据的数据学到的轨迹预测能力可以转移,以协助导航中的人们在导航中的人们障碍,并在移动机器人中灌输人类导航行为,从而实现更好的人机互动。为此,构建了一个新的Egocentric人类轨迹预测数据集,其中包含在佩戴相机的拥挤空间中导航的人们的真实轨迹,以及提取丰富的上下文数据。我们提取并利用三种不同的方式来预测摄像机佩戴者的轨迹,即他/她过去的轨迹,附近人的过去的轨迹以及场景语义或场景的深度等环境。基于变压器的编码器解码器神经网络模型,与熔化多种方式的新型级联跨关注机构集成,已经设计成预测相机佩戴者的未来轨迹。已经进行了广泛的实验,结果表明,我们的模型在Emocentric人类轨迹预测中优于最先进的方法。
translated by 谷歌翻译
我们提出了Exe-Gan,这是一种新型的使用生成对抗网络的典范引导的面部介绍框架。我们的方法不仅可以保留输入面部图像的质量,而且还可以使用类似示例性的面部属性来完成图像。我们通过同时利用输入图像的全局样式,从随机潜在代码生成的随机样式以及示例图像的示例样式来实现这一目标。我们介绍了一个新颖的属性相似性指标,以鼓励网络以一种自我监督的方式从示例中学习面部属性的风格。为了确保跨地区边界之间的自然过渡,我们引入了一种新型的空间变体梯度反向传播技术,以根据空间位置调整损耗梯度。关于公共Celeba-HQ和FFHQ数据集的广泛评估和实际应用,可以验证Exe-GAN的优越性,从面部镶嵌的视觉质量来看。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译